Cross-lingual Name Tagging and Linking for 282 Languages

نویسندگان

  • Xiaoman Pan
  • Boliang Zhang
  • Jonathan May
  • Joel Nothman
  • Kevin Knight
  • Heng Ji
چکیده

The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through crosslingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from crosslingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data. All the data sets, resources and systems for 282 languages are made publicly available as a new benchmark 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning

In this paper we tackle a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on ILspecif...

متن کامل

Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables ...

متن کامل

RPI BLENDER TAC-KBP2016 System Description

We used Stanford Corenlp toolkit (Manning et al., 2014b) for English name tagging. To extract name mentions from Chinese and Spanish documents, we use bi-directional LSTMs (Long Short Term Memory) networks which can leverage long distance features. The input of the networks are pretrained word embeddings and randomly generalized character embeddings. Both word embedding and character embeddings...

متن کامل

Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources

Agenda • EC-Joint Research Centre (JRC) – Who we are • Monolingual plagiarism detection (PD) work at the JRC • Cross-lingual similarity calculation at the JRC • Named entity (NE) matching across languages • Linking related news items across languages • Identifying translations of documents • JRC's multilingual tools and resources • Summary JRC-Who we are • European Commission (scientific-techni...

متن کامل

Error Analysis of Cross-lingual Tagging and Parsing

We thoroughly analyse the performance of cross-lingual tagger and parser transfer from English into 32 languages. We suggest potential remedies for identified issues and evaluate some of them.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017